Skip to content

Conversation

@fultonj
Copy link
Contributor

@fultonj fultonj commented Dec 19, 2025

Problem:
DCN compute nodes' OVN Controller agents cannot connect to the OVN SB database
because the default ovncontroller-config ConfigMap uses Kubernetes ClusterIP
(tcp:ovsdbserver-sb.openstack.svc:6642) which is not routable from external
EDPM nodes on different network segments.

While DNS resolution works (via dnsmasq at 192.168.122.80), the resolved
ClusterIP cannot be reached from DCN sites which are on different internalapi
subnets (172.17.10.x for dcn1, 172.17.20.x for dcn2 vs central's 172.17.0.x).

This causes port binding failures when launching VMs in DCN availability zones:
"Binding failed for port, please check neutron logs for more information"

Evidence:

  • Central compute OVN Controller agents: Connected and working (:-) status)
  • DCN compute OVN Controller agents: NOT registered in OVN SB database
  • ovn-sbctl show shows only central computes and gateway, no DCN chassis

Root Cause:
Setting edpm_ovn_dbs variable is insufficient because the edpm_ovn role loads
ovncontroller-config ConfigMap data which overrides the ovn-remote setting.
The default ConfigMap (created by OVNDBCluster operator) uses ClusterIP.

Solution:

  1. Retrieve OVN SB internalapi IPs from pod annotations
  2. Create DCN-specific ConfigMap (ovncontroller-config-dcn) with direct IPs
  3. Create DCN-specific DataPlaneService (ovn-dcn) referencing this ConfigMap
  4. Patch dcn1/dcn2 nodesets to use ovn-dcn service instead of ovn

This ensures DCN nodes connect to OVN SB via routable internalapi IPs:
tcp:172.17.0.34:6642,tcp:172.17.0.36:6642,tcp:172.17.0.35:6642

Co-Authored-By: Claude [email protected]
Signed-off-by: John Fulton [email protected]

@openshift-ci
Copy link

openshift-ci bot commented Dec 19, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jistr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a4d319b24b8342a787b83d63227f4077

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 10m 50s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 11m 36s

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

This PR is stale because it has been for over 15 days with no activity.
Remove stale label or comment or this PR will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 4, 2026
@fultonj fultonj force-pushed the dcn_adoption_pr_ovn_sb branch from 9635e47 to 3238e07 Compare January 5, 2026 20:53
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/23a15a66fd964088a5c0e4c21ae73ca1

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 05m 21s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 06m 22s

@github-actions github-actions bot removed the Stale label Jan 6, 2026
Problem:
DCN compute nodes' OVN Controller agents cannot connect to the OVN SB database
because the default ovncontroller-config ConfigMap uses Kubernetes ClusterIP
(tcp:ovsdbserver-sb.openstack.svc:6642) which is not routable from external
EDPM nodes on different network segments.

While DNS resolution works (via dnsmasq at 192.168.122.80), the resolved
ClusterIP cannot be reached from DCN sites which are on different internalapi
subnets (172.17.10.x for dcn1, 172.17.20.x for dcn2 vs central's 172.17.0.x).

This causes port binding failures when launching VMs in DCN availability zones:
  "Binding failed for port, please check neutron logs for more information"

Evidence:
- Central compute OVN Controller agents: Connected and working (`:-)` status)
- DCN compute OVN Controller agents: NOT registered in OVN SB database
- `ovn-sbctl show` shows only central computes and gateway, no DCN chassis

Root Cause:
Setting edpm_ovn_dbs variable is insufficient because the edpm_ovn role loads
ovncontroller-config ConfigMap data which overrides the ovn-remote setting.
The default ConfigMap (created by OVNDBCluster operator) uses ClusterIP.

Solution:
1. Retrieve OVN SB internalapi IPs from pod annotations
2. Create DCN-specific ConfigMap (ovncontroller-config-dcn) with direct IPs
3. Create DCN-specific DataPlaneService (ovn-dcn) referencing this ConfigMap
4. Patch dcn1/dcn2 nodesets to use ovn-dcn service instead of ovn

This ensures DCN nodes connect to OVN SB via routable internalapi IPs:
  tcp:172.17.0.34:6642,tcp:172.17.0.36:6642,tcp:172.17.0.35:6642

Co-Authored-By: Claude <[email protected]>
Signed-off-by: John Fulton <[email protected]>
@fultonj fultonj force-pushed the dcn_adoption_pr_ovn_sb branch from 3238e07 to 1c8bf1d Compare January 15, 2026 16:14
@fultonj fultonj changed the title Do Not Merge: Dcn adoption pr ovn sb Configure OVN SB direct IPs for DCN nodesets Jan 15, 2026
@fultonj fultonj marked this pull request as ready for review January 15, 2026 16:16
@fultonj fultonj requested review from abays, jistr and olliewalsh and removed request for abays January 15, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant